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Abstract 

In an influential 1981 paper, Guibas and Odlyzko constructed a gen- 
erating function for the number of length n strings over a finite alphabet 
that avoid all members of a given set of forbidden substrings. Here we 
extend this result to the case in which the strings are weighted. This 
investigation was inspired by the problem of counting compositions of an 
integer n that avoid all compositions of a smaller integer m, a notion 
which arose from the consideration of one-stded random walks. 

1 Introduction 

In [3] Guibas and Odlyzko construct a generating function for the number of 
length n strings over a finite alphabet that avoid all members of a given set of 
forbidden substrings. Here we assign a weight to each letter of the alphabet, 
define the weight of a string to be the sum of the weights of its letters, and 
determine a generating function for the number of weight n strings that avoid 
a particular set of forbidden substrings. This investigation was inspired by the 
problem of counting compositions of an integer n (which can be viewed as weight 
n strings over the alphabet {1, 2,3,.. .}) that do not contain a composition of 
a smaller integer m occurring in consecutive positions (i.e., avoid a substring 
of weight m). This latter problem arose from the consideration of one-sided 
random walks, which are introduced here, and further investigated by Bender, 
Lawler, Pemantle, and Wilf in [1]. 

Heubach and Kitaev have also extended Guibas and Odlyzko's results from 
words to compositions. In [1] they consider length (number of parts) and weights 
in compositions over alphabets of the form {1,2,..., n}. In this paper we con- 
sider arbitrary weighted alphabets. For more on the combinatorics of composi- 
tions and words, see [5|. 

2 Forbidden Substrings 

In this section we recall Guibas and Odlyzko's theorem concerning forbidden 
substrings. 



A set S — {A, B, . . . ,T} oi strings over an alphabet is reduced if no string 
contains another as a substring. (In particular, no string in S is empty.) Let 
f{n) denote the number of length n strings that avoid each member of S. For 
each string H in S let /i/(n) denote the number of length n strings that end 
with H and avoid all members of S except for the single appearance of H at 
the end. 

Define generating functions F{z) = f{n)/z"' and Fh{z) ~ fH(n)/z"'. 

n>0 n>0 

The correlation of two strings G and H, denoted GH , is a string over {0, 1} 
with the same length as G. The z*'* character from the left in GH is determined 
by placing H under G so that the leftmost character of H is under the 
character (from the left) in G. If all the pairs of characters in the overlapping 
segment are identical, then the i*^ character of GH is 1. If not, it is 0. For 
example if 17 = {a, b}, G = ababba, and H ~ abbab, then GH = 001001 as 
illustrated below. 

ababba 
a b b a b 

abbab 

1 abbab 

abbab 

abbab 

1 abbab 

Let GHz denote the correlation of G and H interpreted as a polynomial in 
the variable z. In the above example, GH^ — z^ + 1. 

Theorem 1 ( Guibas, Odlyzko) Given a reduced set S = {A, B, . . . , T} of strings 
over an alphabet of q > 2 characters, the generating functions F{z), Fa{z), 
Fb{z), . . ., Ft{z) satisfy the following system of linear equations: 

{z-q)F{z) + zFa{z) + ... + zFt{z) = z 

F{z) - zAAzFa[z) - ... - zTAzFt{z) = 
F{z) - zABzFa{z) - ... - zTB^Ft{z) = 



F{z) ~ zATzFaIz) - ... - zTT,Ft[z) = 

The fact that S is reduced guarantees this system is nonsingular, and we 
can solve for the generating functions as rational functions of z. 



3 Weighted Strings 

Theorem [T] shows us how to construct a generating function for the number of 
length n strings that avoid each member of a given set of forbidden substrings. 
In this section we extend this result to the case in which the strings are weighted, 
and count weight n strings. 
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A weighted alphabet w(Q) has each letter h assigned a weight Wh- The 



t 

weight of a string H = hih2 ... /is over u;(f2) is the sum wh — ''^'i, ^^"^ 

1=1 

weights of the individual letters. A set S of weighed strings is reduced if no 
string contains any other as a substring. 

Given a set S of reduced strings over a weighted alphabet w{il), let f{n) 
denote the number of weight n strings that do not contain any substring in S. 
Similarly for each H in S let fnin) denote the number of weight n strings that 
end with H and do not contain any substring in S except for the single appear- 
ance of H at the end. Define F{z) = ^ f{n)/z^^ and Fh{z) = ^ fH{n)/z"-. 

n>0 ri>0 

Note /(O) = 1 counts the empty composition while /^(n) = for n less than 
the number of letters in H . 

Next we extend the notion of correlation for two strings to a weighted version. 
For the ordinary correlation GH of two strings G and H , the i*'' character 
from the left is 1 if and only if G and H overlap on the string gigi+i ■ ■ ■ gr 
for some r. The weighted correlation w(GH) is a multiset, and the weight 
Wg. + Wg.^-^ + . . . + Wg^ of thc strlug on which G and H overlap is in w{GH). 
More specifically, for any two strings G = gig2 ■ ■ ■ gr and H = hih2 . ■ - ht over a 
weighted alphabet w(fl), the weighted correlation w{GH) is a (possibly empty) 
mulitset. This multiset contains k if and only if there is an i such that hi = 
gi, h2 = gi+i, . . ., hr-i+i = gr, and k = Wh^ + Wh^ + . . . + Wh,._,^i is the weight 
of the overlap. 

For example, let w{n) = {1, 2, . . .} with w, ^ i. Set A = 3, B = 21, C = 12, 
and D = 111. Then w{AA) = wIbB) = w{CC) = {3}, w{CB) = {2}, 
w{BC) = w{DG) = w{BD) = {1}, and w{DD) ^ {1,2,3}. The remaining 
weighted correlations are empty. Note neither correlation nor weighted correla- 
tion is commutative in general. 

Finally we define w{GH)z to be the polynomial ^ z*^. When w{GH) = 

kew(GH) 

0, the polynomial w{GH)z is 0. Thus w{DD)z — z^ + z"^ + z, for example. 
We now prove an extension of Theorem [TJ 

Theorem 2 Given a reduced set S = {A, B, . . . , T} of strings over a weighted 
alphabet w{Q), the generating functions F{z), Fa{z), Fb{z), Ft{z) satisfy 
the following system of linear equations: 

f5fF(z) -^Fa(z) +Fb{z) ••• +Ft{z) =1 

F{z) -w{AA)^Fa{z) ^w{BA)^Fa{z) ••• -w{TA)^Ft{z) =0 
F{z) ~w{AB)^Fa{z) ~w{BB)^Fa{z) ••• ~w{TB)^Ft{z) =0 

F{z) -w{AT)^Fa{z) -w{BT)^Fa{z) ••• -w{TT)^Ft{z) =0 

Proof. The first equation in the above system follows from the observation that 
f{n+l) + fA{n+l) + . . .+ frin+l) = /(n) + /(n-l) + . . .+/(0). This recurrence 
holds because any string /iift.2 ■ ■ - ht counted by one of f{n + 1), fA{n -t- 1), 
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fsin + 1), . . ., fxin + 1) arises by appending the character ht to the string 
ft,i/i2 . . . ht-i counted by /(n + 1 — ht). The right hand side of the recurrence 
equation is the coefBcient of 1/z" in — l)_F(z), and the left hand side of the 
equation is the coefficient of 1/z" in z[F{z)~1] + zFa(z) + zFb{z) + . . . + zFt(z). 
(Recall /(O) = 1, but /^(O) = /b(0) = . . . /t(0) = 0.) 

The remaining equations result from the fact for any H in S we have f{n) = 
E fAin + k)+ E fBin + k) + ...+ E /T(n + fc). Toseethis 

kew(AH) kew(BH) kew(TH) 

let H = ■ ■ ■ ht and suppose Y — j/i?/2 ■ ■ - Us is any string counted by f{n). 
Let Z = z\Zi . . . Zs+t = 2/i?/2 ■ • ■ yshih2 ■ ■ ■ ht denote the concatenation of strings 
Y and H. Now Z contains at least one string in 5* as a substring. Let G = 
9x92 ■ ■ ■ Qr denote the leftmost such substring. The for some m > s we have 
5i<?2 ■ ■ ■ 9r — Zu^r+i ■ ■ ■ Zu-iZu, and Z1Z2 ... Zu is counted by /g(" + k) for some 
k £w{GH). 

Conversely ii k ^ w (GH), then any string counted by fa {n) arises from 
the concatenation of a string Y counted by f{n) and H. Thus the equality 
holds. Since E E foin + k) /z" = E z'' E fa (n + k) = 

n>Qk^w(GH) kew(GH) n>0 

w{GH)zFq{z), we obtain the remaining equations in the system. ■ 



As was the case with Theorem [1] the fact that S is reduced guarantees the 
system is nonsingular. To see this, consider the determinant 



(z) = dot 



2-2 

1 
1 



1 1 

-w{AA)^ -w{BA)^ 

-w iAB)l ~w (bB)^ 

-w{AT)^^ -w{BT)^ 



1 

^w{TA)^ 
^w{TB)^ 

~w{TT)^ 



Since S is reduced the highest degree polynomial in each column occurs on the 
diagonal. When we expand </'(z), we have z — 1 in the denominator and a unique 
highest degree monomial produced by the product of the diagonal terms in the 
numerator. The degree of this monomial is the sum 1 + wa + + . . . + Wt- 
We can therefore solve for F(z), Fa{z), Fb{z), . . . ,Ft{z) and find that each is 
a rational function of z. 



4 Compositions 

The inspiration for extending Theorem [1] to Theorem [2] came from the problem 
of counting compositions of an integer n that avoid compositions of a smaller 
integer m occurring in consecutive positions. For example, the composition 
2 + 4 + 1 + 1 + 4 of n = 12 contains the compositions 2 + 4, 4+1 + 1, and 
1 + 1 + 4 of 6 in consecutive positions, while avoiding all compositions of to = 3 



4 



in consecutive positions. (Note it does contain the composition 2 + 1 of 3 in 
nonconsecutive positions.) 

We can apply Theorem [5] to find, for example, a generating function for the 
numbers of compositions of n that avoid all compositions of to = 3 occurring 
in consecutive positions. To do so we view compositions as words over w(ri) = 
{1, 2, 3, . . .} with Wi — i. For example, we identify the composition 2+4+1 + 1+4 
as the word 24114. Set S = {A ^ 3, B ^ 21, C = 12,D ^ 111}. The number 
of compositions of n that avoid all compositions of 3 occurring in consecutive 
positions is given by the number of weight n strings over w{Q) which do not 
contain any substring in 5*. Let f(n) denote this number. For each H in S, let 
denote the number of weight n strings which end with H and contain no 
substring in S except for the single occurrence of H at the end. 

Set F{z) = J2 fin)/z" and = E Earlier we recorded 

n>0 n>0 

the weighted correlation w{GH) for each pair of strings in S. We use this 
information to form the table below. The polynomial w{GH)z appears in row 
H and column G. 

ABC D 

A 

B ^2 Q 

C z z 

D z z^ + z'^ + z 

Theorem [5] guarantees the generating functions satisfy the following system 
of equations: 



^F{z) +Fa{z) +Fb{z) +Fc{z) +Fd{z) =1 

F(z) -z^Fa{z) =0 

Fiz) -z^Fb{z) -z^Fciz) =0 

Fiz) -zFb{z) -z^Fciz) ~zFd{z) =0 

F{z) -zFb{z) -(z3 + z2 + z)Fi5(z) =0 



Solving this system yields 

F{z) = {z^ -2z^ + z^)/{z^ - z'^ - z^ + z^ - z"^ - z^ - z'^ + z + 1) 
= 1 + 1/z + 2/z^ + 2/z^ + 3/^5 + 9/z^ + 12/z'^ + 20/2^ . . . 

as desired. 

5 Motivating Problem 

The question of composition avoidance arose from the consideration of board 
games in which a roll of one or more (fair, 6-sided) dice determines the number of 
"squares" a player moves forward on a given turn. Some squares are undesirable 
to land on, and one would like to know the probability of avoiding them, given 
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the number of squares separating a particular "bad" square from one's current 
square. 

To solve this problem, we replace the board with a finite number of squares, 
through which we cycle repeatedly, with an infinite succession of squares extend- 
ing in one direction. A sequence of dice rolls determines a one-sided random 
walk which begins on square 0, and continues through squares 1, 2, 3, and so 
on, landing on some squares while avoiding others. What is the probability that 
a one-sided random walk avoids square m? 

We record a one-sided random walk as an "infinite composition" of positive 
integer parts. For example, 1 + 2 + 2 + .. . indicates a sequence of rolls beginning 
with a roll of 1 followed by two rolls of 2. What is the probability that a 
one-sided random walk avoids an initial composition of ml 

Let P(rn) denote the probability that a one-sided random walk begins with 
a composition of m, and define P(0) = 1. In the simplest case, we use a single 
die to determine the size of each step in the walk, and compute P[m) using the 
observation that P(m) = \P{m -l) + \P{m -2) + \P{m -i) + \P{m - 4) + 
^P{m — 5) -|- ■^P{ni — 6). From the recurrence we obtain the generating function 

g{z) = ^M^" = 1 U ^ 2^ \^ 4^ 5^ 6^ 
— ' 1 — ^(z + z"^ + z-^ + + z'^ + z°) 

m>0 o ^ ' 

which converges for \z\ < 1. Since g{z) has a simple pole at z = 1, and the 
residue there is — |, we know P{m) « ^ for large m. For large m, the probability 
that a one-sided random walk avoids square m is therefore 1 — P{m) « |. We 
can arrive at the same result using the fact that the recurrence for P{m) has 
constant coefficients. Specifically P(m) = j + cir™ + C2r1J^ + c^r™ + Cir'^ + c^r'S^ , 
where |ri| < 1 for 1 < i < 5. 

More generally, we can consider one-sided random walks in which pi is the 
probability of moving i squares on a given turn. When pi is determined by the 
roll of two dice, we obtain P{m) « y. The notion of one-sided random walks is 
considered further by Bender, Lawler, Pemantle, and Wilf in Ij. They compute, 
for instance, the probability of a "collision" when two players take simultaneous 
one-sided random walks. If C(m) is the probability of a collision for the first 
time on square m, then 

^ C(m)a;2™ = 1 

where p{z) — ^ PiZ^. 

i>i 

It is easy to count finite compositions of an integer n that avoid an initial 
composition of m < n. Compositions that begin with an initial composition of 
m have the form t + cr, where r is a composition of m and cr is a composition 
of n — m. There are 2™^^ • 2"^"^^ = 2"~^ compositions of n that begin with 
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a composition of to, and therefore also 2"'"^ that avoid an initial composition 
of m, independent of our choice for to. In other words, the probability that a 
randomly selected composition of n avoids an initial composition of m < n is 
the same as the probability that it doesn't, for all such m. 

The mathematical literature contains numerous results concerning permu- 
tations and multiset permutations (which can be viewed as compositions) that 
avoid particular patterns, i.e., permutations on fewer letters (see [2], for exam- 
ple, for an introduction to the field). The above investigation can be framed in 
this context as follows. We know we can easily count compositions of n that 
avoid an initial composition of to < n. This results suggests the more general 
goal of counting compositions of n that avoid a composition of to anywhere. We 
can interpret this statement in several ways: 

1. Count compositions of n that avoid all compositions of to occurring in 
consecutive positions. 

2. Count compositions of n that avoid a particular composition t of to oc- 
curring in consecutive positions. 

3. Count compositions of n that avoid a particular composition r of to in 
(possibly) nonconsecutive positions. 

4. Count compositions of n that avoid all compositions of to in (possibly) 
nonconsecutive positions. 

Here we have solved 1 and 2 with Theorem [2l Problem 3 is straightforward, 
and 4 is open. 

The problems above use the word "avoid" in a narrow sense compared to 
that for patterns. We can define a notion of composition avoidance analogous 
to that for pattern avoidance. To do so we view the compositions of n as 
multiset permutations. For example, the compositions l-l-H-2, 1+2+1, and 
2+1+1 correspond to the permutations 112, 121, and 211 of the multiset {l'^, 2}. 
We identify the compositions of n = 4 with permutations of the multisets {4}, 
{1, 3}, {2^}, {1^, 2}, and {1^}. It is well-known that the number of permutations 
of a set of n letters that avoid a pattern tt of 3 letters is independent of tt. Since 
the same result holds for multisets (see [6] or [7]), we see that the number of 
compositions of n that avoid a "composition pattern" (I suggest the term motif) 
TT with 3 distinct parts is independent of the parts. It would be interesting to 
investigate motif avoidance for other motifs. The (1 + 2)-avoiding compositions 
are the partitions. How about the (1 + 2 + l)-avoiding compositions? 
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